Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data.

نویسندگان

  • Peter J Waddell
  • Hirohisa Kishino
  • Rissa Ota
چکیده

Evolutionary trees sit at the core of all realistic models describing a set of related sequences, including alignment, homology search, ancestral protein reconstruction and 2D/3D structural change. It is important to assess the stochastic error when estimating a tree, including models using the most realistic likelihood-based optimizations, yet computation times may be many days or weeks. If so, the bootstrap is computationally prohibitive. Here we show that the extremely fast "resampling of estimated log likelihoods" or RELL method behaves well under more general circumstances than previously examined. RELL approximates the bootstrap (BP) proportions of trees better that some bootstrap methods that rely on fast heuristics to search the tree space. The BIC approximation of the Bayesian posterior probability (BPP) of trees is made more accurate by including an additional term related to the determinant of the information matrix (which may also be obtained as a product of gradient or score vectors). Such estimates are shown to be very close to MCMC chain values. Our analysis of mammalian mitochondrial amino acid sequences suggest that when model breakdown occurs, as it typically does for sequences separated by more than a few million years, the BPP values are far too peaked and the real fluctuations in the likelihood of the data are many times larger than expected. Accordingly, several ways to incorporate the bootstrap and other types of direct resampling with MCMC procedures are outlined. Genes evolve by a process which involves some sites following a tree close to, but not identical with, the species tree. It is seen that under such a likelihood model BP (bootstrap proportions) and BPP estimates may still be reasonable estimates of the species tree. Since many of the methods studied are very fast computationally, there is no reason to ignore stochastic error even with the slowest ML or likelihood based methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evolutionary history of subspecies of Eurasian nuthatches (Sitta europaea persica) from Zagros Mountains, Iran

Abstract. Eurasian Nuthatch (Sitta europaea), with 18 subspecies, has a wide distribution in deciduous forests of Eurasia. The subspecies S.e.persica is a resident bird in the Zagros Mountains, from north-west to south-west of Iran. The aim of this study was to evaluate the taxonomic and phylogenetic relationships of this subspecies to European, Asian, as well as Caucasian clades. For this purp...

متن کامل

Molecular Identification of the Persian Gulf Sea Hare (Aplysia sp.) Based on 16s rRNA Gene Sequence

Background: Sea hares of the Aplysia genus are among the mollusks of interest for various researchers to study their phylogeny, bioactive compounds and the nervous system. These mollusks are herbivorous and produce chemical compounds (ink) to defend themselves. The present study provided molecular identification of the Persian Gulf (Bushehr city) sea hare using 16s rRNA gene sequence. Materials...

متن کامل

Online Bayesian phylogenetic inference: theoretical foundations via Sequential Monte Carlo.

Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if computationally expensive means of doing phylogenetics. Modern data collection technologies are qui...

متن کامل

Calculating the evolutionary rates of different genes: a fast, accurate estimator with applications to maximum likelihood phylogenetic analysis.

In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates ba...

متن کامل

Fast Hashing Algorithms to Summarize Large Collections of Evolutionary Trees

Different phylogenetic methods often yield different inferred trees for the same set of organisms. Moreover, a single phylogenetic approach (such as a Bayesian analysis) can produce many trees. Consensus trees and topological distance matrices are often used to summarize the evolutionary relationships among the trees of interest. These summarization techniques are implemented in current phyloge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2002